Part Number Hot Search : 
DS1086U AN1164 LN513 TDA8961 DTC143XA AD532KD 10100C 00006
Product Description
Full Text Search
 

To Download AN910 Datasheet File

  If you can't view the Datasheet, Please click here to try to view without PDF Reader .  
 
 


  Datasheet File OCR Text:
  AN910/0297 1/50 this is preliminary information from sgs-thomson. details are subject to change without notice. application note st7 and st9 performance benchmarking by a. albella, g. bouvier and j. pauvert abstract sgs-thomson has developed a set of test routines relevant to 8-bit and low-end 16-bit microcontroller applications to evaluate computing performance and interrupt processing performance of microcontroller cores . these routines have been implemented on st7 and st9 microcontroller units (mcus) as well as several mcus available on the market. the routines have been written in assembler language to optimize their implementation and focus on core performance, without being dependent upon compiler code transformation. for each test, the two parameters of interest are execution time and code size . timings have been either measured whenever possible, or theoretically calculated when there was no other alternative. in most cases, programs have really run and execution times have actually been measured, so that assembly sources should not contain implementation errors and results can be considered as correct and reliable. the results of this study point out the capability of the st9+ to compete with 16-bit mcus on 8-bit and low-end 16-bit applications and confirms its position of high-end 8/16-bit mcu . it also confirms the st7 as an outstanding 8-bit mcu . the first four sections provide synthetical information: 1. overview of the test routines on page 2 2. overview of the mcu cores on page 3 3. benchmark results on page 4 4. result analysis on page 11 more detailed information is provided in the appendixes: 5. description of mcu work environments on page 17 6. complete numerical results on page 21 7. mcu core architecture analysis on page 25 8. description of the test routines on page 43 9. measurement proceeding and calculation on page 46 1
2/50 overview of the test routines 1 overview of the test routines eleven different test routines have been implemented in assembler language . the first ten routines are oriented at measuring core computing performance . they are based on known algorithms and represent currently used operations in 8-bit and low-end 16- bit applications. they mix bit, 8-bit and 16-bit operations as many applications do. this set of tests is described in table 1 . table 1. test routine overview another test routine handling a timer interrupt has been used to measure core interrupt processing performance : a more precise description of the test routines is available in section 8 . note 1. the couple of values used are (m,n)=(3,5) and (m,n)=(3,6) note 2. the values used are n=10 (words) and n=600 (words) note 3. the values used are n=64 (bytes) and n=512 (bytes) abbreviated name full name description features stressed sieve eratosthenes sieve find prime numbers 3 3 out of 8189 elements 16-bit data computation bit manipulation acker(m,n) (1) ackermann function make recursive function calls number of calls depending upon two parameters (m,n) function calls stack use string string search search a 16-byte string in a 128- character array 8-bit data block manipulation string manipulation char character search search a byte in a 40-byte array 8-bit data manipulation char manipulation bubble(n) (2) bubble sort sort of a one-dimension array of n 16-bit integers 16-bit data manipulation integer manipulation blkmov(n) (3) block move move a n-byte block from a place in memory to another 8-bit data block manipulation block move convert block translation translate a 121-byte block in a different format 8-bit data manipulation use of a lookup table 16mul 16-bit integer multiplication multiplication of two unsigned words giving a 32-bit result 16-bit data computation integer manipulation shright 16-bit value right shift shift a 16-bit value five places to the right 16-bit data manipulation bit manipulation bitsrt bit manipulation set, reset, and test of 3 bits in a 128-bit array bit computation bit and 8-bit data manipulation abbreviated name full name description features stressed interrupt timer interrupt standard timer input capture or/ and output compare interrupt service routine interrupt processing
3/50 overview of the mcu cores 2 overview of the mcu cores the set of mcus evaluated is composed of various 8-bit, 8/16-bit, and 16-bit microcontrollers with accumulator, register file or mixed architectures. table 2 is an overview of the mcu cores. table 2. mcu cores overview a description of the mcu work environments is available in section 5 . note 1. as the goal is to obtain the best of each mcu core, the maximum internal frequency (freq) available, for each mcu, on development board has been used (unless other specified). note that results are directly proportional to this frequency. mcu name architecture short core description freq (1) 80c51xa philips 16-bit; register file extended architecture (xa) of 80c51s - upward compatible 8/16-bit register bus - 16-bit data/program memory buses register file programming model with sixteen 16-bit banked registers 20 mhz 68hc16 motorola 16-bit; two accumulators core architecture superset of 68hc11s - upward compatible accumulator programming model with two 16-bit accumulators, and three 16-bit index registers (all with 4-bit extensions) 16 mhz 68hc12 motorola 16-bit; two accumulators instruction set is superset of 68hc11s - upward compatible programming model identical to 68hc11s 8 mhz st9+ sgs-thomson 8/16-bit; register file evolution of the st9 enhanced clock speed, instruction cycle time enlarged memory space 25 mhz st9 sgs-thomson 8/16-bit; register file 8/16-bit architecture; 8-bit register bus - 16-bit memory bus register file programming model with 14 groups of sixteen 8-bit registers, useable as 16-bit registers modular paged registers for access to peripheral registers 12 mhz h8/300 hitachi 8/16-bit; register file risc-like architecture and instruction set register file programming model with sixteen 8-bit registers 10 mhz 68hc11 motorola 8-bit; two accumulators market standard 8-bit mcu accumulator programming model with two 8-bit accumulators or one 16-bit accumulator, and two 16-bit index registers 4 mhz 68hc08 motorola 8-bit; accumulator superset of the 68hc05 - upward compatible enhanced performance and instruction set accumulator programming model with one 8-bit accumulator, and one 16-bit index register 8 mhz st7 sgs-thomson 8-bit; accumulator upward compatible with the 68hc05 accumulator programming model with one 8-bit accumulator, and two 8-bit index registers 4 mhz 8 mhz 80c51 intel, philips... 8-bit; register file and accumulator mixed accumulator and register file programming model with four banks of eight 8-bit registers (include accumulator), and a 16-bit data pointer 20 mhz ks88 samsung 8-bit; register file core architecture superset of super8s; 8-bit register bus register file programming model with 192 8-bit prime data registers, and two register sets with system/peripheral/data registers 8 mhz 78k0 nec 8-bit; register file and accumulator mixed accumulator and register file programming model with four banks of eight 8-bit or four 16-bit registers (include accumulator) 10 mhz
4/50 benchmark results 3 benchmark results 3.1 core computing performance the two following charts show benchmark results for computing performance. execution time and code size are presented as global ratios taken the st9+ as reference . preliminary ratios have been calculated for each test. using those results, a global execution time ratio and a global code size ratio have been calculated as an average of all ratios. as all the tests could not have been implemented on all mcus ( see 9.2.2 memory considerations ), one or two different results are presented for each mcu. the first one, available for all the mcus, has been calculated with the reduced set of tests performed on all the mcus. the second one, only available for some mcus, has been calculated with the full set of tests . refer to section 6 for complete results. refer to section 9 for measurement proceeding and calculation description. figure 1 presents execution time ratios and figure 2 shows code size ratios. notes: the reduced set of tests is composed of: string, char, bubble(10 words), blkmov(64 bytes), convert, 16mul, shright, bitrst the full set of tests is composed of: string, char, bubble(10 words), blkmov(64 bytes), convert, 16mul, shright, bitrst, sieve, acker(3,5), acker(3,6), bubble(600 words), blkmov(512 bytes) the 80c51 results are preliminary results. they may change in later versions.
5/50 benchmark results 00.511.5 2 1.90 1.54 1.46 1.00 0.32 0.67 0.26 0.67 0.46 0.23 0.23 0.18 0.21 1.92 1.47 1.33 1.00 0.32 0.64 0.20 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) reduced set of tests full set of tests best performance figure 1. computing performance global execution time ratios (st9+ as reference) 8-bit mcus 16-bit mcus 8/16-bit mcus
6/50 benchmark results 00.511.5 2 0.95 1.04 0.85 1.00 1.00 0.91 1.10 1.23 1.21 1.21 1.50 1.28 1.03 0.94 1.03 0.90 1.00 1.00 0.98 1.11 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) reduced set of tests full set of tests best density figure 2. computing performance global code size ratios (st9+ as reference) 8-bit mcus 16-bit mcus 8/16-bit mcus
7/50 benchmark results 3.2 core interrupt processing performance the three following charts show benchmark results for interrupt processing performance. execution time results are presented as time values (in microseconds), and also as ratios taken the st9+ as reference . code size results are presented as ratios taken the st9+ as reference . refer to section 6 for complete results and details on calculation. figure 3 presents execution time results in microseconds, showing interrupt latency & return time. figure 4 presents execution time ratios, and figure 5 presents code size ratios.
8/50 benchmark results 0 5 10 15 2.25 2.25 2.63 1.84 5.50 2.10 8.75 2.38 3.00 6.00 4.60 3.00 4.30 3.80 5.63 5.13 3.52 10.33 6.90 13.25 4.75 5.63 11.25 9.40 10.50 13.70 (s) 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) interrupt latency & return time interrupt routine execution time figure 3. interrupt processing performance execution time values 8-bit mcus 16-bit mcus 8/16-bit mcus best performance
9/50 benchmark results 0 0.5 1 1.5 0.93 0.63 0.69 1.00 0.34 0.51 0.27 0.74 0.63 0.31 0.37 0.34 0.26 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) figure 4. interrupt processing performance execution time ratios (st9+ as reference) 8-bit mcus 16-bit mcus 8/16-bit mcus best performance
10/50 benchmark results 00.511.52 1.46 1.85 0.85 1.00 1.00 1.69 0.69 0.82 0.85 0.85 1.00 1.08 1.15 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) figure 5. interrupt processing performance code size ratios (st9+ as reference) 8-bit mcus 16-bit mcus 8/16-bit mcus best density
11/50 result analysis 4 result analysis this section is an analysis of computing performance and interrupt processing performance results (for execution time and code size). based on core architecture analysis (see section 7 ), two comparisons are presented, pointing out the strong and weak points of each mcu. the first concerns the high-end to medium-end mcus versus st9+ . the second concerns the medium-end to low-end mcus versus st7 . 4.1 preliminary remark results show that the two different ratios, for execution time and code size, calculated with full and reduced sets of tests, are in fact not very different. in most cases, the classification of the mcus is kept. thus we can consider that the reduced set is sufficient to make the mcu core comparison. 4.2 high-end to medium-end mcu analysis versus st9+ the table 3 presents the strong and the weak points for high-end to medium-end mcus, compared to the st9+ mcu. notes: ict means instruction cycle time and il means instruction length. refer to paragraph 7.2.2 average ict/cpi and il for details on calculation. refer to paragraph 7.3.4 st9+ mcu core to see the main characteristics of the st9+ mcu core . 4.2.1 computing performance results regarding speed, the st9+ mcu ranks at the top of 8/16-bit mcus . this new version of the st9 has been improved on several points, including clock per instruction and clock speed. these enhancements have considerably reduced its instruction cycle time. a large and powerful register file organized in groups allow the st9+ to perform strong computation (with many registers), have an easy access to peripheral and i/o port registers (with paged registers), and manage multitasking (with register group pointers). addressing modes like register pair, register indirect with pre/post-increment, and indexed give the st9+ the ability to perform 16-bit data computation and manipulation , easily manipulate tables and move blocks . a new memory management unit enlarges the memory space up to 4 mbytes . new instructions have been added to handle this new space and improve the c-language support .
12/50 result analysis concerning code efficiency, the position of the st9+ mcu is also among the best mcus . the 16-bit mcus are only a little better, although favoured by their true 16-bit computing and data manipulation instructions. in the 8/16-bit mcus, the h8/300 takes a little advantage due to its special block move instruction. but all 8-bit mcus, even with shorter instruction lengths, have longer code size results. 4.2.2 interrupt processing performance results regarding speed, the st9+ mcu ranks at the first position . the value chart shows that it has the shortest interrupt latency but also an interrupt routine execution time which is among the best. these results show that its interruption management and instruction cycle time have been considerably enhanced. the register groups bring in addition fast context switching capabilities. some 8-bit mcus, such as the 68hc08, work quite well in this test. but their performance must be moderated because such mcus can manage only one interrupt at the time and so cast off a complex arbitration phase. the interrupt management of the st9+ is one of the more advanced , allowing nested interrupts with full software programmable priorities and program priority level control . code efficiency results for interrupt processing performance are not really significant. the code represents only a very small part of an entire interrupt service routine, and so no conclusion can be made. 4.2.3 conclusion global results and all its characteristics allow the st9+ to compete with the true 16-bit mcus on 8-bit and low-end 16-bit applications, and confirm its position of high-end 8/16-bit mcu .
13/50 result analysis table 3. high-end to low-end mcu strong and weak points mcu strong points weak points 80c51xa (20 mhz) instruction processing : fast 8/16-bit alu : short average ict : special addr. modes : special instructions : multitasking : large memory space : interrupt processing : 7-byte prefetch queue predecoding 16-bit datapath 600 ns 8x8 multiplication 250 to 300 ns indirect with 8/16 offset or auto-increment compare & branch like decrement & branch like memory-to-memory moves context switching capabilities up to 16 mbytes nested mode 4-bit program priority register programmable priority levels address alignment : lacking addr. modes : even jump/branch address even word operand address nop instructions in assembly code no indexed addressing 68hc16 (16 mhz) instruction processing : fast 8/16/32-bit alu : short average ict : special addr. modes : special instructions : multitasking : large memory space : interrupt processing : 3-stage prefetch queue predecoding 16-bit datapath 625 ns 8x8 multiplication 375 to 440 ns post-modified indexed with 8-bitoffset memory-to-memory moves context switching capabilities up to 1 mbyte up to 16 mbytes with memory expansion module nested mode 3-bit program priority register programmable priority levels address alignment : instruction lengths : lacking addr. modes : lacking instructions : performance penalty if odd word operand addresses only even no direct addressing index register manipulation compare & branch like decrement & branch like 68hc12 (8 mhz) instruction processing : fast 8/16-bit alu : short average ict : special addr. modes : special instructions : large memory space : 2-stage prefetch queue predecoding 20-bit datapath 375 ns 8x8 multiplication 375 to 500 ns auto-incr/decrement indexed accumulator offset indexed memory-to-memory moves incr/decrement & branch like test & branch like up to 4 mbytes with memory expansion module multitasking : interrupt processing : need memory expansion module one interrupt at a time recommended no program priority register hardware fixed priorities h8/300 (10 mhz) instruction encoding : short average il : special addr. modes : special instructions : risc-like encoding 2 to 3 bytes register indirect, 16-bit offset or pre/post-increment block moves instruction processing : medium 8/16-bit alu : medium average ict : lacking instructions : multitasking : memory space : interrupt processing : standard (no prefetch) 1400 ns 8x8 multiplication 500 to 600 ns 16-bit shifts/rotations compare & branch like decrement & branch like no special capabilities 64 kbytes one interrupt at a time recommended no program priority register hardware fixed priorities
14/50 result analysis table 3. high-end to low-end mcu strong and weak points (contd) mcu strong points weak points 68hc11 (4 mhz) instruction processing : medium 8/16-bit alu : long average ict : lacking instructions : multitasking : memory space : interrupt processing : standard (no prefetch) 2500 ns 8x8 multiplication 1500 to 1750 ns compare & branch like decrement & branch like no special capabilities 64 kbytes one interrupt at a time recommended no program priority register hardware fixed priorities 68hc08 (8 mhz) instruction processing : fast 8-bit alu : special addr. modes : special instructions : large memory space : 1-byte prefetch queue 8-bit datapath 625 ns 8x8 multiplication indexed with 8-bit offset or post-increment memory-to-memory moves compare & branch like decrement & branch like up to 4 mbytes with memory expansion module medium average ict : lacking addr. modes : multitasking : interrupt processing : 500 to 625 ns no indirect addressing no special capabilities one interrupt at a time recommended no program priority register hardware fixed priorities
15/50 result analysis 4.3 medium-end to low-end mcu analysis versus st7 the table 4 presents the strong and the weak points for medium-end to low-end mcus, compared to the st7 mcu. notes: ict means instruction cycle time and il means instruction length. refer to paragraph 7.2.2 average ict/cpi and il for details on calculation. refer to paragraph 7.3.9 st7 mcu core to see the main characteristics of the st7 mcu core . 4.3.1 computing performance results regarding speed, the st7 mcu takes the second position just below the newly arrived 68hc08. with no prefetch mechanism, it comes even so ahead of all the other mcus. a short clock per instruction added to a standard frequency explains its short instruction cycle time and its advantageous position. the two index registers and the indirect addressing mode allow the st7 to easily perform data manipulation like table manipulation and block move . a direct addressing mode in a 256-byte zero page give a rapid access to important data and peripheral registers . concerning code efficiency, the st7 mcu ranks among the 8-bit mcus , very closely above the 68hc08. a standard instruction length explains its average position. 4.3.2 interrupt processing performance results regarding speed, the st7 mcu ranks very close to the 68hc08 . a longer instruction cycle time explains this tiny gap. the strong point of its interrupt management is the automatic stacking of the cpu state, accumulator and index register. this process eliminates software stacking, and so saves time and space. code efficiency results for interrupt processing performance are not really significant. the code represents only a very small part of an entire interrupt service routine, and so no conclusion can be made. 4.3.3 conclusion global results and all its characteristics confirm the st7 as an outstanding 8-bit mcu .
16/50 result analysis table 4. medium-end to low-end mcu strong and weak points mcu strong points weak points 68hc11 (4 mhz) medium 8/16-bit alu : long average ict : lacking instructions : multitasking : 2500 ns 8x8 multiplication 1500 to 1750 ns compare & branch like decrement & branch like no special capabilities 68hc08 (8 mhz) instruction processing : fast 8-bit alu : short average ict : special addr. modes : special instructions : large memory space : 1-byte prefetch queue 8-bit datapath 625 ns 8x8 multiplication 500 to 625 ns indexed with 8-bit offset or post-increment compare & branch like decrement & branch like memory-to-memory moves up to 4 mbytes with memory expansion module lacking addr. modes : multitasking : no indirect addressing no special capabilities 80c51 (20 mhz) short average il : special addr. modes : special instructions : multitasking : 1 to 2 bytes register indirect stack pointer relative compare & branch like decrement & branch like bit test & bit clear & jump memory-to-memory moves context switching capabilities slow 8-bit alu : long average ict : 2400 ns 8x8 multiplication 900 to 1000 ns ks88 (8 mhz) special addr. modes : special instructions : multitasking : interrupt processing : register pair indirect register/address indexed (short/long) compare & increment & branch like decrement & branch like context switching capabilities nested mode level priority control register slow 8-bit alu : long average ict : data memory location : 3000 ns 8x8 multiplication 1250 to 1500 ns off-chip only 78k0 (10 mhz) special addr. modes : special instructions : multitasking : register indirect stack pointer relative indexed with 8-bit offset decrement & branch like context switching capabilities mixed architecture : slow 8-bit alu : long average ict : only accumulator oriented 3200 ns 8x8 multiplication 1400 to 1600 ns
17/50 description of mcu work environments 5 description of mcu work environments this section is a short description of the work environment, with the tools used (hardware and software tools), for each mcu during the benchmarks. 5.1 80c51xa mcu tools 5.2 68hc16 mcu tools 5.3 68hc12 mcu tools hardware tools p51xag35 chip p51xadb/e development board/emulator note that no external ram was available on the development board. software tools a microsoft windows based integrated development environment have been elaborated upon by macraigor systems incorporated. the interesting tools for the benchmarks were a standard editor, an xa absolute macro assembler, and an emulator interface/debugger. hardware tools mc68hc16z1 chip m68hc16z1evb evaluation board jumpers are set to configure the board. note that, to access the i/o pin used for execution time measuring, a context switch is needed and add to each test routine 6 bytes and 375 ns. this length and time have been subtracted from measured results, in order not to disadvantage this mcu. if they are taken into account, the computing performance results are just a little worse (1.40) but code efficiency decreases down to 1.45. note that the external ram of the evaluation board needs wait states and so was not use. software tools masm16 (dos environment) is an integrated environment for writing, editing assembling and debugging source code. it also allows to set the assembler options which are: masm -i 'name' .lst -o 'name' .o -a -b 'name '.asm >_masm16.err evb16 is a dos debugger for 68hc16z1evb. hardware tools mc68hc812a4 chip m68hc12a4evb evaluation board jumpers have been left as configured in factory. note that the external ram of the evaluation board needs wait states and so was not use. software tools the development of the routines is performed within an integrated development environment (ide) : motorola mcu software. in a windows environment, this software brings a project manager (mcu project), a macro-assembler (mcu asm), and a motorola s-record generator (hex). the compilation options are: masm -y -w3 -i 'name' .lst -a -o 'name' .o 'name'.asm hex -f 'name' .hex 'name' .o a communications program is then necessary to connect the pc to the evaluation board through a rs232 serial link. we have used procomm plus for windows, but any other communications program can suit the link to the evaluation board and its d-bug12 monitor/ debugger program, resident in external eprom. note that the tbne, tbeq, dbne, dbeq, ibne, and ibeq instructions were not usable without problems with the board used.
18/50 description of mcu work environments 5.4 st9+ mcu tools 5.5 st9 mcu tools 5.6 h8/300 mcu tools hardware tools st90r192 chip circuit real time emulation system st9+ hds2 (hardware development system 2) the pll clock has been used (see configuration in assembly codes) software tools the gnu c toolchain (gcc9) for the st9+ is used to assemble the code sources (in assembler language). the command line with its options is: gcc9 -v -g -c -o 'name' .o 'name' .st9 then it is linked with the linker ld9: ld9 -i -i -m -tdata 0x10000300 -o 'name' .u 'name' .o to debug the program, the windows debugger wgdb9xxx for st9+ is used together with the emulator. here, the configuration file hardware.gdb is the following one: clear_map map 0x000000 32 sw map 0x008000 16 sr hardware tools st90r50 chip circuit real time emulation system st9 hds2 (hardware development system 2) software tools the gnu c toolchain for st9 is used. the options are the following ones: gcc9 -v -g -c -o 'name' .o 'name' .st9 ld9 -i -i -m -tdata 0x10000300 -o 'name' .u 'name' .o the windows debugger wgdb9xxx is used with the configuration file hardware.gdb : bankswitch off pd_signal used sdb sr ea 3<<2 sdb sr fc 08 sdb sr fd 08 sdb sr fe 00 # mapping of memory map p:0x0000 0x7fff sr map d:0x0000 0x7fff sw hardware tools h8/330 chip lev8330 evaluation board default jumpers settings have been kept. note that the code was placed on external memory (the size of internal ram is limited to 512 bytes). as the access to external memory is 3 times longer than the access to internal memory, the measured execution time results have been corrected. for each test, a value, equals to (200ns x number of bytes executed), has been subtracted (200ns for each byte of code). actually, only the instruction fetch was wrong, and it lasted 300ns instead of 100ns for each byte. software tools the eurodesc h-series interface software (intfc3) allows the user to communicate with the hitachi's executive monitor system (ems) located on the development board. it uses a dos environment.
19/50 description of mcu work environments 5.7 68hc11 mcu tools 5.8 68hc08 mcu tools 5.9 st7 mcu tools hardware tools mc68hc11a8 chip mc68hc11a8evm evaluation board note that the internal chip frequency on evaluation board was 2 mhz, but as 4 mhz versions are available, this frequency was used for results (execution time values have been divided by 2). note that it was not possible to emulate external ram. software tools the integrated assembler iasm11 (dos environment) allows to blend an editor and a cross assembler into one single environment. a dos environment is used to debug programs. hardware tools mc68hc708xl36 chip eml08xl36 emulator module plugged in the m68mmevs05 modular evaluation system (platform board for eml08xl36) jumpers configure both. software tools rapid, a software development tool in a dos environment allows to execute all the operations. it consists of a configuration program (rinstall) and a cross assembler (casm). rinstall contains a serie of data entry screens. only casm and the mmev08x dos debugger were configured as follows: ? cross assembler configuration: casm assembler entry screen name and fully path: 'path_of_casm08.exe' primary options: s l d secondary options: s l d i ? debugger configuration: debugger entry screen fully path: 'path_of_mmevs08.exe' options: -b note that the assembler does not seem to manage the zero page addressing mode. thus, the results have been modified to take this addressing mode into account. without zero page addressing mode, the execution time result changes to 0.61 and the code size result increases up to 1.43. hardware tools st7275 chip st7 hds (hardware development system) emulator with st7275 dbe (dedicated board emulator) note that measures have been made with a 4 mhz mcu, but as 8 mhz versions exist, two values are presented with the two frequencies (for the 8 mhz version, execution time values have been divided by 2). software tools the toolchain used for the st7 includes a meta-assembler (asm), a generic linker (lyn), and a generic formatter (obsend). these software tools are used with the following options : asm -sym -li 'name' lyn 'name' asm 'name' -fi = 'name' .map obsend 'name' , f, 'name' .s19, srec the windows environment is used by the debugger: windows debugger wgdb7.
20/50 description of mcu work environments 5.10 80c51 mcu tools 5.11 ks88 mcu tools 5.12 78k0 mcu tools hardware tools p80c32gbpn chip microtek easypack 8051 serial emulator note that the internal chip frequency on evaluation board was 12 mhz, but as 20 mhz versions are available, this frequency was used for results (execution time values have been divided by 20/12). software tools iar 8051 assembler hardware tools ks880504 and ks880116 chips smds ii in-circuit emulator (samsung microcontroller development system 2) with target boards tb880504a and tb880116a a function generator has been used to reach the 8 mhz frequency. it has been connected to the personality board in the smds2 emulator after having selected the extra clock source with the switches in the front panel. note that this mcu do not own any internal ram - register file space excepted. it was also impossible to emulate external memory. tests have been performed using register file only. software tools everything is done from the smds operating program software (dos environment). sama (samsung assembler) is used to assemble the programs with the following command line and options: sama.exe %s /k /lst then, the program is loaded to smds2 memory (emulation memory) and a work file is made ( [m] key). the debugging screen is accessed with the [d] key. hardware tools pd78p014 chip 78k0 starter kit note that it was not possible to emulate external ram. software tools the pd78p014 toolchain consists of a micro series assembler (a78000) and a micro series generic linker (xlink). the command lines are as follows: a78000 'name' .asm 'name' .lst xlink 'name' -o 'name' .o -f bench.xcl the file bench.xcl extends the length of xlink command line. the extra options included in bench.xcl are: -c78000 -fnec -z(code)intvec=8000 -z(code)code=8080 -z(data)data=fb00 -z(data)wrkseg,shortad=fe20-fedf -z(bit)bitvars=0 -y2 the 78k0 starter kit has a dos environment. 1
21/50 complete numerical results 6 complete numerical results here are the tables with the complete numerical results. 6.1 core computing performance the first two tables ( table 5 and table 6 ) concern execution time with the values measured in milliseconds and the ratios calculated with st9+ mcu as reference. the next two tables ( table 7 and table 8 ) concern code size with the values measured in bytes and the ratios calculated with st9+ mcu as reference. the last two tables ( table 9 and table 10 ) present global execution time ratios and global code size ratios with reduced and full set of tests. refer to section 9 for measurement proceeding and calculation description. notes: the reduced set of tests includes string, char, bubble(10 words), blkmov(64 bytes), convert, 16mul, shright, bitrst tests. they are in boldface characters. numbers with parenthesis have been judged out of range and have not been taken into account. in fact, it means that this specific test was absolutely unadapted to this specific mcu. only some tests, which are not include in the reduced set, are concerned. 6.2 core interrupt processing performance table 11 concerns execution time with the values measured in microseconds, showing interrupt latency & return time, the total time, and the ratios calculated with st9+ mcu as reference. table 12 concerns code size with the values measured in bytes and the ratios calculated with st9+ mcu as reference. the execution time has only been calculated theoretically with the assembly code, like computing performance theoretical execution time ( see 9.1.1 execution time measure ). the result is the sum of the interrupt latency (execution time of the longest instruction and interrupt entry time) and the execution time of the interrupt service routine. the code size has been calculated with the assembly code. legend: s x.xx best results t x.xx worst results legend: s x.xx best results t x.xx worst results 1
22/50 complete numerical results table 5. computing performance execution time measures (1) the 80c51 results are preliminary results. they may changed in later versions. execution time measures (ms) 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (1) (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) 1 sieve s 25.1 27.8 47.5 41.4 142 147 t 244 2 acker(3,5) s 148 224 230 268 868 916 950 1,280 t 1,400 3 acker(3,6) s 602 920 936 1,090 3,530 3,720 3,850 5,190 t 5,690 4 string 0.178 0.157 s 0.15 0.160 0.514 0.369 0.54 0.264 0.345 0.690 t 1.17 0.796 0.744 5 char 0.042 0.039 s 0.037 0.048 0.149 0.071 0.140 0.039 0.0070 0.140 0.142 t 0.276 0.216 6 bubble(10 words) s 0.170 0.223 0.328 0.306 0.988 0.741 1.33 1.14 1.09 2.18 1.99 t 2.39 1.46 7 bubble(600 words) s 638 968 1,280 1,190 3,830 3,750 5,130 4,280 t 8,560 6,440 8 blkmov(64 bytes) s 0.025 0.035 0.037 0.057 0.174 0.036 0.259 0.078 0.153 0.305 0.233 t 0.484 0.260 9 blkmov(512 bytes) s 0.167 0.272 0.289 0.452 1.36 0.261 2.05 1.34 2.67 (8.61) t 3.84 3.28 10 convert s 0.146 0.227 0.288 0.223 0.766 0.397 0.82 0.265 0.452 0.904 0.584 1.03 t 1.06 11 16mul 0.0019 0.0017 s 0.0016 0.0068 0.020 0.012 0.029 0.013 0.018 0.037 0.035 0.032 t 0.040 12 shright s 0.0013 0.0038 0.0046 0.0034 0.011 0.010 0.017 0.0072 0.010 0.020 t 0.031 0.022 0.020 13 bitsrt s 0.047 0.050 0.055 0.059 0.178 0.071 0.215 0.086 0.092 0.183 0.203 t 0.283 0.204 table 6. computing performance execution time ratios execution time ratios 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (1) (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) 1 sieve s 1.65 1.49 0.87 1.00 0.29 0.28 t 0.17 2 acker(3,5) s 1.81 1.20 1.16 1.00 0.31 0.29 0.28 0.21 t 0.19 3 acker(3,6) s 1.81 1.18 1.16 1.00 0.31 0.29 0.28 0.21 t 0.19 4 string 0.90 1.02 s 1.05 1.00 0.31 0.43 0.30 0.61 0.46 0.23 t 0.14 0.20 0.22 5 char 1.14 1.23 s 1.28 1.00 0.32 0.67 0.34 1.23 0.68 0.34 0.34 t 0.17 0.22 6 bubble(10 words) s 1.80 1.37 0.93 1.00 0.31 0.41 0.23 0.27 0.28 0.14 0.15 t 0.13 0.21 7 bubble(600 words) s 1.87 1.23 0.93 1.00 0.31 0.32 0.23 0.28 t 0.14 0.19 8 blkmov(64 bytes) s 2.30 1.65 1.56 1.00 0.33 1.57 0.22 0.74 0.38 0.19 0.25 t 0.12 0.22 9 blkmov(512 bytes) s 2.71 1.66 1.56 1.00 0.33 1.73 0.22 0.34 0.17 (0.052) t 0.12 0.14 10 convert s 1.54 0.98 0.78 1.00 0.29 0.56 0.27 0.84 0.49 0.25 0.38 0.22 t 0.21 11 16mul 3.60 3.92 s 4.22 1.00 0.35 0.56 0.23 0.52 0.37 0.19 0.20 0.22 t 0.17 12 shright s 2.67 0.92 0.75 1.00 0.30 0.35 0.20 0.48 0.34 0.17 t 0.11 0.16 0.17 13 bitsrt s 1.25 1.18 1.08 1.00 0.33 0.83 0.27 0.69 0.65 0.32 0.29 t 0.21 0.29
23/50 complete numerical results table 7. computing performance code size measures (1) the 80c51 results are preliminary results. they may changed in later versions. code size measures (bytes) 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (1) (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) 1 sieve 49 68 73 s 48 s 48 54 t 74 2 acker(3,5) 73 68 s 62 88 88 86 80 t 122 94 3 acker(3,6) 73 68 s 62 88 88 86 80 t 122 94 4 string 57 52 s 43 50 50 52 54 61 53 53 t 76 54 54 5 char 31 26 21 29 29 28 s 20 22 22 22 t 61 35 27 6 bubble(10 words) 41 44 s 40 44 44 42 57 106 88 88 t 155 69 39 7 bubble(600 words) 41 44 s 40 44 44 42 57 (764) (764) t 71 8 blkmov(64 bytes) 18 20 15 17 17 12 13 13 14 14 s 12 t 22 14 9 blkmov(512 bytes) 18 20 19 17 17 24 13 t 44 t 44 s 12 22 16 10 convert 24 t 32 22 23 23 22 29 s 14 22 22 16 25 17 11 16mul 10 10 s 744444062 t 66 t 66 t 66 55 49 58 12 shright s 8141110101214 t 16 15 15 14 t 16 15 13 bitsrt 340 304 310 261 261 s 138 233 260 290 290 219 t 343 256 table 8. computing performance code size ratios code size ratios 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (1) (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) 1 sieve 1.02 1.42 1.52 s 1.00 s 1.00 1.13 t 1.54 2 acker(3,5) 0.80 0.77 s 0.71 1.00 1.00 0.98 0.91 t 1.39 1.07 3 acker(3,6) 0.83 0.77 s 0.71 1.00 1.00 0.98 0.91 t 1.39 1.07 4 string 1.14 1.04 s 0.86 1.00 1.00 1.04 1.08 1.22 1.06 1.06 t 1.52 1.08 1.08 5 char 1.07 0.90 0.720 1.00 1.00 0.97 s 0.69 0.76 0.76 0.76 t 2.10 1.21 0.93 6 bubble(10 words) 0.93 1.00 s 0.91 1.00 1.00 0.96 1.30 2.41 2.00 2.00 t 3.52 1.57 0.89 7 bubble(600 words) 0.93 1.00 s 0.91 1.00 1.00 0.96 1.30 (17.4) (17.4) t 1.61 8 blkmov(64 bytes) 1.06 1.18 0.88 1.00 1.00 0.71 0.77 0.77 0.82 0.82 s 0.71 t 1.29 0.84 9 blkmov(512 bytes) 1.06 1.18 1.12 1.00 1.00 1.41 0.77 t 2.60 t 2.60 s 0.71 1.29 0.94 10 convert 1.04 1.40 0.96 1.00 1.00 0.96 1.26 s 0.61 0.96 0.96 0.70 1.09 0.74 11 16mul 0.23 0.23 s 0.16 1.00 1.00 0.91 1.41 t 1.50 t 1.50 t 1.50 1.25 1.11 1.32 12 shright s 0.80 1.40 1.10 1.00 1.00 1.20 1.40 t 1.60 1.50 1.50 1.40 t 1.60 1.50 13 bitsrt 1.30 1.17 1.19 1.00 1.00 s 0.53 0.89 1.00 1.11 1.11 0.84 t 1.31 0.98
24/50 complete numerical results table 9. computing performance global execution time ratios (1) the 80c51 results are preliminary results. they may changed in later versions. table 10. computing performance global code size ratios (1) the 80c51 results are preliminary results. they may changed in later versions. table 11. interrupt processing performance execution time values and ratios table 12. interrupt processing performance code size values and ratios global execution time ratios 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (1) (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) with reduced set of tests s 1.90 1.54 1.46 1.00 0.32 0.67 0.26 0.67 0.46 0.23 0.23 t 0.18 0.21 with full set of tests s 1.92 1.47 1.33 1.00 0.32 0.64 t 0.20 global code size ratios 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (1) (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) with reduced set of tests 0.95 1.04 s 0.85 1.00 1.00 0.98 1.10 1.24 1.21 1.21 t 1.50 1.28 1.03 with full set of tests 0.94 1.03 s 0.90 1.00 1.00 1.04 t 1.11 execution time values (s) and ratios 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) interrupt latency & return 3.15 4.19 3.75 s 2.40 7.17 3.90 t 21.75 2.88 3.88 7.75 8.40 3.00 4.30 execution time values 4.70 7.56 6.25 s 4.08 12.00 8.70 t 17.25 5.25 6.50 13.00 10.80 10.50 13.70 execution time ratios 0.87 0.54 0.65 s 1.00 0.34 0.47 t 0.19 0.78 0.63 0.31 0.38 0.34 0.26 code size values and ratios 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) code size values (bytes) 28.5 t 36 16.5 19.5 19.5 33 s 13.5 16 16.5 16.5 19.5 21 22.5 code size ratios 1.46 t 1.85 0.85 1.00 1.00 1.70 s 0.69 0.82 0.85 0.85 1.00 1.08 1.15
25/50 mcu core architecture analysis 7 mcu core architecture analysis this section presents, for the different mcus, the main parameters of the core architecture which are significant for benchmark result analysis. 7.1 parameter description the significant parameters of core architecture are the following ones: on-chip/off-chip buses - on-chip buses address bus size data/program memory bus sizes register bus size (if any) - off-chip buses (if any) address bus size data/program memory bus size multiplexing memory spaces harvard organization von neumann organization - special register space (if any) - data/program memory spaces - interrupt vector table location and size arithmetic logic unit datapath size - standard operations - special functions and performance instruction processing standard prefetch mechanism - queue size - predecoding (if any) - address alignment cpu internal buses address bus size, data bus size register bus (if any) instruction set cisc/risc encoding - clock per instruction (cpi) - average clock per instruction - instruction length (il) - average instruction length - special addressing modes - special instructions programming model register file accumulator(s) mixed register file/accumulator - list of registers (they may be outside the cpu) - multitasking capabilities move rd,rs add rd,#2 + / x ldaa #8, x adda #a0 cpu
26/50 mcu core architecture analysis 7.2 remarks on some parameters 7.2.1 instruction processing only two different instruction processings exist: ? standard processing: current instruction is completely processed before next one is fetched ? prefetch mechanism: some next opcodes are prefetched as current instruction is processed the prefetch mechanism is best described as a queue rather than as a pipeline. queue logic fetches program information and positions it for execution, but instructions are executed sequentially. a typical pipelined cpu executes more than one instruction at the same time. the queue size is given, but performance is not precised because no value is given by databooks. nevertheless, general statistics on instruction processing mechanisms give an usual average 20%-25% gain for one stage , and this gain is not more than 25%-30% for two stages . additional stages without complex mechanisms do not give higher gain. anyway, the instruction processing mechanism has a leading role in general performance. 7.2.2 average ict/cpi and il the average ict (instruction cycle time) is a currently used parameter. but it is linked to the frequency f, then we prefer the average cpi (clock per instruction) to describe the instruction set . on the other hand, to compare mcu core performance , the frequency has to be considered, and so the average ict is used in result analysis ( section 4 ). charts with ict and il ranges are presented at the end of this section ( see 7.4 instruction cycle time chart and 7.5 instruction length chart ). remark that the average ict (in s) is the inverse of the mips parameter (million instruction per second), and so we have the formula: ( f is in mhz and ict is in s) the average ict/cpi and average il have been calculated considering all available instructions and all possible addressing modes , favouring mostly used ones in the test routines. ranges are presented instead of decimal values, to take the subjectivity of the calculation into account. thus the values can be considered as reliable. f1 mips = = cpi ict
27/50 mcu core architecture analysis 7.2.3 special addressing modes and instructions test routines assembly code analysis has pointed out that some addressing modes and instructions can reduce significantly the code size. to a minor extent, execution time may also be decreased. the addressing modes and instructions concerned are usually those which allow to make two operations within a single instruction. indirect with pre/post-increment addressing mode is an example. this mode is very useful for loops and block moves. modes allowing memory-to-memory transfers are another example for block moves. in the same way, instructions such as bit test & set, decrement & branch, or compare & branch have stood out for the same reasons. these addressing modes and instructions are mentioned in tables as special addressing mode s and special instructions . 7.3 mcu core analysis the following paragraphs are synthetical diagrams presenting the main parameters of core architecture for each mcu. those parameters have been synthesized from the databooks. some special characteristics are also mentioned, even if they are not really significant for the benchmark result analysis.
28/50 mcu core architecture analysis 7.3.1 80c51xa mcu core on-chip/off-chip buses - on-chip buses 16-bit address bus (up to 24-bit) 8/16-bit data memory bus 8/16-bit program memory bus 8/16-bit sfr bus - off-chip buses 8/16-bit address bus (up to 24-bit) 8/16-bit multiplexed sfr/data/program mem. bus the two buses may be multiplexed the two buses are multiplexed with ports memory spaces harvard organization - segmented data/program memory spaces data memory space up to 255 segments of 64 kbytes each = 16 mbytes 1-kbyte zero page/segment (32 bytes bit addr.) special function register space (logically separate) 512 bytes of on-chip registers (64 bytes bit addr.) 512 bytes of off-chip registers program memory space up to 255 segments of 64 kbytes each = 16 mbytes first 284-byte interrupt vector table = 71 interrupts arithmetic logic unit 16-bit datapath - 8/16-bit operations - special functions 8x8 unsigned multiplication 12 cycles 16x16 (un)signed multiplications 12 cycles 8/8 unsigned division 12 cycles 16/8 (un)signed divisions (12)14 cycles 32/16 (un)signed divisions (22)24 cycles 32-bit shifts 6 cycles instruction processing prefetch mechanism - 7-byte queue - predecoding - jump/branch address even alignment addition of some 1-byte nop instructions - word operand even alignment addition of some 1-byte nop instructions cpu internal buses 16-bit mux. address/data/control bus 8/16-bit sfr bus (special function register) instruction set cisc encoding - cpi 2 cycles to 24 cycles - average cpi between 5 and 6 cycles - il 2 bytes to 6 bytes - average il between 3 and 4 bytes - special addressing modes register access as bit, word, or doubleword immediate with 11-bit addresses indirect with 8/16-bit offset or auto-increment - special instructions exchange register contents push/pull multiple registers memory-to-memory moves register indirect to reg. ind., both auto-increment compare & branch like decrement & branch like programming model register file - banked registers 4 banks of four 16-bit registers - global registers four 16-bit registers (up to 12) - others registers 16-bit program counter (up to 24-bit) two 8-bit segment registers 16-bit system and user stack pointers - special function registers program status word, system configuration register segment select register data/extra/code segment registers on-chip/off-chip peripheral and i/o port registers - multitasking capabilities context switching with banked registers system and user modes move rd,rs add rd,#2 + / x 80c51xa cpu
29/50 mcu core architecture analysis 7.3.2 68hc16 mcu core memory spaces harvard organization - pseudo-linear data/program memory space data memory space 16 banks of 64 kbytes each = 1 mbyte peripheral registers in last segment program memory space 16 banks of 64 kbytes each = 1 mbyte first 512-byte interrupt vector table = 207 interrupts on-chip/off-chip buses - on-chip buses 16-bit address bus + 4-bit extension (= 20 bits) extensible up to 24 bits 8/16-bit multiplexed data/program memory bus - off-chip buses 16-bit address bus + 4-bit extension (= 20 bits) extensible up to 24 bits 8/16-bit multiplexed data/program memory bus the two buses are multiplexed with ports arithmetic logic unit 16-bit datapath - 8/16/32-bit operations - special functions 8x8 unsigned multiplication 10 cycles 16x16 (un)signed multiplications (8)10 cycles 16x16 fractional signed multiplication8 cycles 32/16 (un)signed divisions (24)38 cycles 16/16 fractional unsigned division 22 cycles 16/16 integer division 22 cycles mac signed 16-bit fractions 12 cycles r(epeat) mac signed 16-bit fractions 6+12n cycles instruction processing prefetch mechanism - 3-stage queue stage a : latched opcode stage b : executing opcode stage c : hold opcode - predecoding - word operand even/odd alignment substantial performance penalty if odd alignment cpu internal buses 16-bit address bus, 16-bit data bus (to be confirmed) instruction set cisc encoding - cpi 2 cycles to 38 cycles - average cpi between 6 and 7 cycles - il 2 bytes to 6 bytes (even) - average il between 3 and 4 bytes - special addressing modes accumulator offset indexed with 8/16/20-bit offset post-modified indexed mode with 8-bit offset - special instructions 32-bit long integer manipulations exchange register contents push/pull multiple registers memory-to-memory moves extended ? post-modified indexed extended ? extended mac and r(epeat)mac instructions ldaa #8, x adda #a0 + / x programming model accumulators - two 16-bit accumulators useable as one 32-bit accumulator first addressable as two 8-bit registers - three 16-bit index registers with 4-bit extension - others registers 16-bit program counter (with 4-bit extension) 16-bit stack pointer (with 4-bit extension) condition code register two 16-bit & one 36-bit & one 16-bit mac registers operand registers, result register, mask register - extension fields four 4-bit index address extension fields one 4-bit stack address extension fields - multitasking capabilities context switching with extension fields 68hc16 cpu
30/50 mcu core architecture analysis 7.3.3 68hc12 mcu core memory spaces von neumann organization - linear data/program memory space 64 kbytes with first 256-byte zero page peripheral registers in zero page upper 128-byte interrupt vector table = 64 interrupts - memory extension (harvard organization) program/data/extra mem. windows in linear space up to 4-mbyte memory space/window on-chip/off-chip buses - on-chip buses 16-bit address bus 8/16-bit data/program memory bus - off-chip buses 16-bit address bus up to 22 bits with memory expansion module 8/16-bit data/program memory bus the two buses are multiplexed with ports arithmetic logic unit 20-bit datapath - 8/16-bit operations - special functions 8x8 unsigned multiplication 3 cycles 16x16 (un)signed multiplications 3 cycles 32/16 (un)signed divisions (11)12 cycles 16/16 unsigned fractional division 12 cycles 16/16 (un)signed integer divisions 12 cycles min/max of two 16-bit values 4 to 7 cycles mac signed 16x16 to 32-bit mem. 13 cycles 8/16-bit table lookup and interpolate 10 cycles (un)weighted product sum 8n cycles instruction processing prefetch mechanism - 2-stage queue 2-word instruction queue 16-bit holding buffer if queue is full - predecoding - word operand even/odd alignment no performance penalty if odd alignment cpu internal buses 16-bit address bus, 16-bit data bus (to be confirmed) instruction set cisc encoding - cpi 1 cycle to 13 cycles - average cpi between 3 and 4 cycles - il 1 byte to 5 bytes - average il between 3 and 4 bytes - special addressing modes auto pre/post-increment/decrement indexed stack pointer and program counter indexed indexed-indirect with 16-bit offset accumulator offset indexed - special instructions exchange register contents increment/decrement/test & branch like memory-to-memory moves extended ? extended mac & min/max instructions fuzzy logic support, table lookup and interpolate + / x programming model accumulators - two 8-bit accumulators useable as one 16-bit accumulator - two 16-bit index registers - others registers 16-bit program counter 16-bit stack pointer condition code register - multitasking capabilities with memory expansion module context switching with program page register and program/data/extra windows specific call and rtc instructions 68hc12 cpu ldaa #8, x adda #a0
31/50 mcu core architecture analysis 7.3.4 st9+ mcu core on-chip/off-chip buses - on-chip buses 16-bit address bus 8/16-bit data/program memory bus 8-bit register bus - off-chip buses 8/16-bit address bus up to 22-bit with memory management unit 8-bit multiplexed data/program memory bus the two buses may be multiplexed the two buses are multiplexed with ports memory spaces harvard organization - register file space 224 bytes of general purpose registers system, on-chip peripheral, and i/o port registers - linear data/program memory space data memory space up to 256 segments of 16 kbytes each = 4 mbytes program memory space up to 64 segments of 64 kbytes each = 4 mbytes 256-byte interrupt vector table = 128 interrupts user-programmable location cpu internal buses 16-bit address 8-bit data multiplexed bus instruction set cisc encoding - cpi 2 cycles to 26 cycles - average cpi between 10 and 12 cycles - il 1 byte to 6 bytes - average il between 3 and 4 bytes - special addressing modes bit access to whole register file register pair (two 8-bit registers as one 16-bit) register direct/indirect indirect with pre/post-increment indexed (short, long, register, memory) - special instructions exchange register contents bit test & set decrement & branch like memory-to-memory moves register indirect to reg. ind., both post-increment instruction processing prefetch mechanism - next byte prefetching as soon as instruction register is available and address is known arithmetic logic unit 8-bit datapath - 8/16-bit operations - special functions 8x8 unsigned multiplication 22 cycles 16/8 unsigned divisions 26/14 cycles 32/16 stepped unsigned divisions 26 cycles + / x programming model register file - general purpose registers 14 groups of sixteen 8-bit registers - system registers one group of sixteen 8-bit registers flags, central interrupt control register user/system stack pointers mode register, page pointer 2 register group pointers i/o port data registers - paged registers on-chip peripheral data and control registers up to 64 pages of sixteen 8-bit registers - 16-bit program counter - multitasking capabilities context switching with register group pointers ldw rrd,rrs add rd,#2 st9+ cpu
32/50 mcu core architecture analysis 7.3.5 st9 mcu core on-chip/off-chip buses - on-chip buses 16-bit address bus 8/16-bit data/program memory bus 8-bit register bus - off-chip buses 8/16-bit address bus 8-bit multiplexed data/program memory bus the two buses may be multiplexed the two buses are multiplexed with ports cpu internal buses 16-bit address 8-bit data multiplexed bus instruction set cisc encoding - cpi 6 cycles to 38 cycles - average cpi between 16 and 18 cycles - il 1 byte to 6 bytes - average il between 3 and 4 bytes - special addressing modes bit access to whole register file register pair (two 8-bit registers as one 16-bit) register direct/indirect indirect with pre/post-increment indexed (short, long, register, memory) - special instructions exchange register contents bit test & set decrement & branch like memory-to-memory moves register indirect to reg. ind., both post-increment instruction processing prefetch mechanism - next byte prefetching as soon as instruction register is available and address is known arithmetic logic unit 8-bit datapath - 8/16-bit operations - special functions 8x8 unsigned multiplication 22 cycles 16/8 unsigned divisions 28/20 cycles 32/16 stepped unsigned divisions 28 cycles + / x programming model register file - general purpose registers 14 groups of sixteen 8-bit registers - system registers one group of sixteen 8-bit registers flags, central interrupt control register user/system stack pointers mode register, page pointer 2 register group pointers i/o port data registers - paged registers on-chip peripheral data and control registers up to 64 pages of sixteen 8-bit registers - 16-bit program counter - multitasking capabilities context switching with register group pointers st9 cpu ldw rrd,rrs add rd,#2 memory spaces harvard organization - register file space 224 bytes of general purpose registers system, on-chip peripheral, and i/o port registers - linear data/program memory space data memory space up 64 kbytes program memory space up to 64 kbytes first 256-byte interrupt vector table = 128 interrupts
33/50 mcu core architecture analysis 7.3.6 h8/300 mcu core on-chip/off-chip buses - on-chip buses 16-bit address bus 8/16-bit data/program memory bus - off-chip buses 8/16-bit address bus 8-bit data/program memory bus the two buses are multiplexed with ports memory space von neumann organization - linear data/program memory space 64 kbytes upper 176-byte on-chip register field additional 16-byte on-chip register field first 48-byte interrupt vector table = 21 interrupts arithmetic logic unit 8-bit datapath - 8/16-bit operations - special functions 8x8 unsigned multiplication 14 cycles 16/8 unsigned division 14 cycles instruction processing standard - sequential processing cpu internal buses 16-bit address bus, 16-bit data bus 8-bit register bus (to be confirmed) instruction set risc encoding - cpi 2 cycles to 24 cycles - average cpi between 5 and 6 cycles - il 2 bytes or 4 bytes (even) - average il between 2 and 3 bytes - special addressing modes register access as bit, 4-bit, byte, or word register indirect with 16-bit offset with pre/post-increment - special instructions block moves programming model register file - general registers sixteen 8-bit registers useable as eight 16-bit registers include one 16-bit stack pointer - others registers 16-bit program counter condition code register move rd,rs add rd,#2 + / x h8/300 cpu
34/50 mcu core architecture analysis 7.3.7 68hc11 mcu core on-chip/off-chip buses - on-chip buses 16-bit address bus 8-bit data/program memory bus - off-chip buses 8/16-bit address bus 8-bit data/program memory bus the two buses are multiplexed with ports memory space von neumann organization - linear data/program memory space 64 kbytes 256-byte zero page 64-byte peripheral register space upper 41-byte interrupt vector table 18 interrupts arithmetic logic unit 8-bit datapath - 8/16-bit operations - special functions 8x8 unsigned multiplication 10 cycles 16/16 unsigned integer division 41 cycles 16/16 unsigned fractional division 41 cycles instruction processing standard - sequential processing cpu internal buses 16-bit address bus, 8-bit data bus (to be confirmed) instruction set cisc encoding - cpi 2 cycles to 41 cycles - average cpi between 6 and 7 cycles - il 1 byte to 3 bytes - average il between 2 and 3 bytes - special instructions exchange register contents programming model accumulators - two 8-bit accumulators useable as one 16-bit accumulator - two 16-bit index registers - other registers 16-bit program counter 16-bit stack pointer condition code register + / x 68hc11 cpu ldaa #8, x addb #a0
35/50 mcu core architecture analysis 7.3.8 68hc08 mcu core on-chip/off-chip buses - on-chip buses 16-bit address bus 8-bit data/program memory bus - off-chip buses 8/16-bit address bus up to 22-bit with memory expansion module 8-bit data/program memory bus the two buses are multiplexed with ports memory space von neumann organization - linear data/program memory space 64 kbytes up to 4 mbytes with memory expansion module 256-byte zero page 58-byte peripheral register space direct addressable upper 256-byte interrupt vector table = 128 interrupts arithmetic logic unit 8-bit datapath - 8-bit operations - special functions 8x8 unsigned multiplication 5 cycles 16/8 unsigned integer division 7 cycles instruction processing prefetch mechanism - 1-byte queue opcode lookahead register cpu internal buses 16-bit address bus, 8-bit data bus (to be confirmed) instruction set cisc encoding - cpi 1 cycle to 9 cycles - average cpi between 4 and 5 cycles - il 1 byte to 4 bytes - average il between 2 and 3 bytes - special addressing modes indexed with 8-bit offset and post-increment stack pointer relative (8/16-bit offset) - special instructions compare & branch like decrement & branch like memory-to-memory moves direct to direct direct ? indexed with post-increment programming model accumulator - one 8-bit accumulator - one 16-bit index register - other registers 16-bit program counter 16-bit stack pointer condition code register + / x 68hc08 cpu lda #8, x add #a0
36/50 mcu core architecture analysis 7.3.9 st7 mcu core on-chip/off-chip buses - on-chip buses 16-bit address bus 8-bit data/program memory bus memory space von neumann organization - linear data/program memory space 64 kbytes 256-byte zero page 128-byte peripheral register space direct addressable upper 32-byte interrupt vector table = 14 interrupts arithmetic logic unit 8-bit datapath - 8-bit operations - special functions 8x8 unsigned multiplication 11 cycles instruction processing standard - sequential processing cpu internal buses 16-bit address bus, 8-bit data bus (to be confirmed) instruction set cisc encoding - cpi 2 cycles to 12 cycles - average cpi between 4 and 5 cycles - il 1 byte to 4 bytes - average il between 2 and 3 bytes - special addressing modes indirect (short/long) programming model accumulator - one 8-bit accumulator - two 8-bit index registers - other registers 16-bit program counter 16-bit stack pointer condition code register + x st7 cpu ld (x),a add a,#a0
37/50 mcu core architecture analysis 7.3.10 80c51 mcu core on-chip/off-chip buses - on-chip buses 8/16-bit address bus 8-bit data memory bus 8-bit program memory bus - off-chip buses 8/16-bit address bus 8-bit data/program memory bus the two buses are multiplexed the two buses are multiplexed with ports memory spaces harvard organization - linear data/program memory space data memory space 64 kbytes first 128-byte zero page lowest 32-byte banked register space 16-byte bit addressable space special function register space (logically separate) 128-byte special function register space direct addressable only program memory space 64 kbytes first 128-byte zero page first 24-byte interrupt vector table = 5 interrupts arithmetic logic unit 8-bit datapath - 8-bit operations - special functions 8x8 unsigned multiplication 48 cycles 16/8 unsigned division 48 cycles instruction processing standard - sequential processing core internal buses 16-bit address bus, 8-bit data bus (to be confirmed) instruction set cisc encoding - cpi 12 cycles to 48 cycles - average cpi between 18 and 20 cycles - il 1 byte to 3 bytes - average il between 1 and 2 bytes - special addressing modes 16-bit addressing with data pointer register register/stack pointer/data pointer register indirect stack pointer relative - special instructions exchange accumulator and register/direct byte compare/decrement & branch like bit test & bit clear & jump memory-to-memory moves direct to direct direct to indirect programming model register file & accumulator - general registers 4 banks of eight 8-bit registers they are mapped in data memory - special function registers one 8-bit accumulator 16-bit program counter 16-bit data pointer register useable as two 8-bit registers 8-bit stack pointer condition code register peripheral registers they are mapped in data memory - multitasking capabilities context switching with banked registers + / x 80c51 cpu mov a,(r1) add a,#a0
38/50 mcu core architecture analysis 7.3.11 ks88 mcu core on-chip/off-chip buses - on-chip buses 8/16-bit address bus 8-bit program memory bus 8-bit register bus - off-chip buses 8/16-bit address bus 8-bit data/program memory bus the two buses are multiplexed the two buses are multiplexed with ports memory spaces von neumann organization - register file space 192-byte prime data register space (all addr. modes) 64-byte register set 1 16-byte working register space (working reg. addr.) 16-byte system register space (register addressing) 32-byte system & peripheral control register space (register addressing) 64-byte register set 2 64-byte data register space (indirect, indexed, stack) - linear data/program memory space 64 kbytes first 16-kbyte program memory only first 256-byte interrupt vector table = 128 interrupts arithmetic logic unit 8-bit datapath - 8-bit operations - special functions 8x8 unsigned multiplication 24 cycles 16/8 unsigned division 28 cycles instruction processing standard - sequential processing core internal buses 16-bit address bus, 8-bit data bus 8-bit register bus (to be confirmed) instruction set cisc encoding - cpi 6 cycles to 28 cycles - average cpi between 10 and 12 cycles - il 1 byte to 3 bytes - average il between 2 and 3 bytes - special addressing modes register pair (two 8-bit registers as one 16-bit) indirect address/register indexed (short/long) - special instructions compare & increment & branch like decrement & branch like programming model register file - prime registers 192 8-bit prime data registers - two register sets register set 1 sixteen 8-bit working registers sixteen 8-bit system registers 32 8-bit system & peripheral control registers register set 2 64 registers - other registers 16-bit program counter system and user stack pointers - multitasking capabilities context switching with register sets system and user modes + / x ks88 cpu move rd,rs add rd,#2
39/50 mcu core architecture analysis 7.3.12 78k0 mcu core on-chip/off-chip buses - on-chip buses 8/16-bit address bus 8-bit data memory bus 8-bit program memory bus - off-chip buses 8/16-bit address bus 8-bit data/program memory bus the two buses are multiplexed the two buses are multiplexed with ports memory space von neumann organization - linear data/program memory space 64 kbytes upper 256-byte special function register space peripheral registers sfr addressing following 32-byte general register space register addressing 256-byte zero page straddle sfr/register/ram spaces first 64-byte interrupt vector table = 14 interrupts arithmetic logic unit 8-bit datapath - 8-bit operations - special functions 8x8 unsigned multiplication 32 cycles 16/8 unsigned division 50 cycles instruction processing standard - sequential processing core internal buses 16-bit address bus, 8-bit data bus (to be confirmed) instruction set cisc encoding - cpi 4 cycles to 50 cycles - average cpi between 14 and 16 cycles - il 1 byte to 4 bytes - average il between 2 and 3 bytes - special addressing modes register indirect indexed with 8-bit offset stack pointer relative - special instructions decrement & branch like + / x 78k0 cpu programming model register file & accumulator - general registers 4 banks of eight 8-bit registers useable as four 16-bit registers second register is the accumulator they are memory mapped - cpu special function registers 16-bit program counter 16-bit stack pointer program status word - multitasking capabilities context switching with banked registers mov a,(r1) add a,#a0
40/50 mcu core architecture analysis 7.4 instruction cycle time chart the following chart ( figure 6 ) presents complete and average instruction cycle time (ict) ranges for the different mcus. the complete range goes from the minimum to the maximum complete ict. the average ict range goes from the minimum to the maximum average ict. for explanation on calculation, see 7.2.2 average ict/cpi and il . 7.5 instruction length chart the following chart ( figure 7 ) presents complete and average instruction length (il) ranges for the different mcus. the complete range goes from the minimum to the maximum complete il. the average ict range goes from the minimum to the maximum average il. for explanation on calculation, see 7.2.2 average ict/cpi and il .
41/50 mcu core architecture analysis 0 500 1000 1500 2000 2500 3000 100 125 125 160 500 200 500 125 250 500 600 750 400 1200 2375 1625 1040 3670 2400 10250 1125 1500 3000 2400 2500 5000 nanoseconds 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) complete ict range average ict range figure 6. complete and average instruction cycle time ranges 8-bit mcus 16-bit mcus 8/16-bit mcus best performance
42/50 mcu core architecture analysis 01234567 2 2 1 1 1 2 1 1 1 1 1 1 1 6 6 5 6 6 4 3 4 4 4 3 3 4 bytes 80c51xa (20 mhz) 68hc16 (16 mhz) 68hc12 (8 mhz) st9+ (25 mhz) st9 (12 mhz) h8/300 (10 mhz) 68hc11 (4 mhz) 68hc08 (8 mhz) st7 (8 mhz) st7 (4 mhz) 80c51 (20 mhz) ks88 (8 mhz) 78k0 (10 mhz) complete il range average il range figure 7. complete and average instruction length ranges 8-bit mcus 16-bit mcus 8/16-bit mcus best density
43/50 description of the test routines 8 description of the test routines this section is a more precise description of the test routines. for each test, are detailed the algorithm, its implementation and the features which it stresses. 8.1 eratosthenes sieve 8.2 ackermann function 8.3 string search algorithm the eratosthenes sieve is a well-known algorithm which searches the prime numbers greater than or equal 3 out of n elements (n=8189 has been chosen arbitrary). implementation the even numbers greater than 3 are not prime numbers, so that this algorithm only looks for prime numbers among an array of odd numbers. we have chosen an array of 8189 elements. it represents the odd numbers from 3 to 16379. the array is initialized with the value 'true' ('true' = 0), and is then filled with 1 (false) if the corresponding number is not a prime number or is not modified (it keeps the value 0='true') if it is a prime number. don't forget that it is an array of odd numbers: array[j] ? 2j+3 at the beginning of the routine, each number is a potential prime number (initialization value is 'true'). the algorithm consists in setting (to 'false') the odd multiples of every prime number found in the array skimmed through in the ascending order. features stressed this test measures the elementary computational capability and the ability to manipulate data in an array . algorithm the ackermann function is a two parameter function -acker(m,n)- which induces several recursive calls. implementation this test routine is performed with two different pairs of parameters: acker(3,5) and acker(3,6). for instance, with the parameters m=3 and n=6, the function induces 172, 233 procedure calls. features stressed it tests the efficiency in recursive procedure calls and in stacks usage . algorithm the string search consists in searching a 16-byte string in a 128-character array. implementation the data are predefined with the following contents: for the 128-character array, xxxxxxxxpatterxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (64 bytes) xxxxxxxxxxxxxxxxxpattern is here!xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx (64 bytes) and for the 16-byte string, pattern is here! (16 bytes) the searching algorithm looks for the first matching character in the array and then compares the rest of the string. if the searched string has been found, it returns the address of the first character of the string in the array. features stressed this program measures the efficiency in data comparison and string manipulation .
44/50 description of the test routines 8.4 character search 8.5 bubble sort 8.6 block move 8.7 block translation 8.8 16-bit integer multiplication algorithm the character search consists in searching a byte in a 40-byte block. implementation the data are also predefined. the algorithm searches the byte o in the 40-byte block -------------------------------o--------, where the character 'o' is the 32 nd character of the block. features stressed as the string search, this program measures the efficiency in data comparison . algorithm the bubble sort benchmark manages the sorting of a one dimension array of 16-bit integers. implementation the test is performed with 10 words and then with 600 words. the array is initialized with 10 or 600 words (16-bit integers) in reverse order. the algorithm is a classic bubble sort which arranges the 10 words (or the 600 words) in the ascending order of magnitude. note that the routine used is intentionally almost the same for the two values (as though it could have been optimized for the first value). few differences may exist, but they do not modify the way the test is done. features stressed this benchmark demonstrates the efficiency in data comparison and data manipulation but especially in 16-bit value comparison and 16-bit value manipulation. algorithm the block move test routine aims at transferring a block from a place to another place in memory. implementation this program is tested with a 64-byte block and with a 512-byte block. note that the routine used is intentionally almost the same for the two values (as though it could have been optimized for the first value). few differences may exist, but they do not modify the way test is done. features stressed it shows the data blocks manipulation ability . algorithm the convert test routine aims at transferring a block from a place to another place in memory. implementation it uses a table to convert the source block into the destination block. the table contains the translation of the source block elements. this benchmark is useful to convert for example from an ascii code to an ebcdic code... features stressed as the block move test program, it shows the data blocks manipulation ability , but also the ability to use a lookup table . algorithm the 16-bit integer multiplication program performs a multiplication of two unsigned words (16-bit integers), giving a 32-bit result. implementation the two operands chosen here are 256, so that the multiplication performed is: 256 x 256 = 65536 (=10000h hexadecimal value) features stressed this test measures the computational capability of the microcontroller with 16-bit integers .
45/50 description of the test routines 8.9 16-bit value right shift 8.10 bit manipulation 8.11 timer interrupt algorithm the 16-bit value right shift routine shifts a 16-bit value five places to the right. implementation the operand to be shifted is 40h (hexadecimal value). it is taken into account as a 16-bit integer and it is the 16-bit value which is shifted. features stressed it is a test measuring the word (16-bit) and bit manipulation capability . algorithm the bit manipulation benchmark performs the set, the reset, and the test of 3 bits in a 128-bit array. implementation the memory where some bits will be set, reset, and tested, is initialized with the 'ah' value (hexadecimal value). it is composed of 8 words '0aaaah', which represents a 16-byte memory area, that is to say a 128-bit array. the test consists in setting, resetting, and then testing the 10th bit of the array, then the 13th bit of the array, and then the 123 rd bit of the array. setting a bit is setting it to 1. resetting a bit is resetting it to 0. and testing a bit is testing it and setting it to 1 if zero (with the zero flag z also set if zero). features stressed this benchmark measures the computational capability and the efficiency in bit manipulation . algorithm the timer interrupt benchmark is composed of two routines performing an input capture interrupt and an input capture/output compare interrupt. implementation the first routine is the body of an interrupt service routine handling a timer input capture. the second is the body of an interrupt service routine handling a timer input capture or a output compare; as interrupt vectors can be separate, this routine may be composed of two different parts. the routines include: ? the average instruction (that is an instruction lasting the average instruction cycle time) which is interrupted and the interrupt entry process (they represent the interrupt latency) ? the body of a typical interrupt service routine including the following operations: - stack two registers or change register bank (if not done by interrupt processing) - read timer register - call to a subroutine with input capture register content as input parameter or output compare register content as output parameter - return from subroutine - unstack registers or restore register bank (if not done by interrupt processing) - return from interrupt it is true that each mcu has its specific own manner of handling interrupts. reading the timer register and using the input capture/output compare as a parameter for a function call has been judged as a satisfying way to do so. thus, it has been chosen as routine body. features stressed this benchmark measures the interrupt processing performance .
46/50 measurement proceeding and calculation 9 measurement proceeding and calculation this section describes measurement proceeding and calculation for computing performance test routines only. interrupt processing performance test routines are not concerned ( see 6.2 core interrupt processing performance for details on measure and calculation). 9.1 measurement proceeding the parameters measured are execution time and code size . the first has been measured on mcu boards (thanks to an oscilloscope) whenever possible, or with the assembly code. the second has been measured on the assembly code. to facilitate execution time measurement, assembly code has been divided in two parts. the first, called assignments & initializations in the source code, contains the initialization of the mcu and data and then a call to the test routine; which is included in the second part, called test loop . the first part ends with an infinite loop. the execution time and code size will obviously be measured on test loop part. 9.1.1 execution time measure an i/o pin is used to make the measure, thanks to a digital oscilloscope . this i/o pin is configured as an output, with a push-pull, and interrupts are disabled in the initialization part. the pin used for each mcu is detailed in table 13 . table 13. i/o pins for execution time measuring the test loop routine begins with the set of the i/o pin. this marks the beginning of the test routine and so the start of the measure on the oscilloscope (trigger on positive edge). the mcu name i/o pin for measure 80c51xa pin 0 of port 2 68hc16 pin 2 of port e 68hc12 pin 7 of port e st9+ pin 0 of port 4 st9 pin 0 of port 4 h8/300 pin 0 of port 6 68hc11 pin 0 of port b 68hc08 pin 0 of port a st7 pin 0 of port b 80c51 pin 0 of port 1 ks88 pin 0 of port 2 (for 88c0504) pin 0 of port 4 (for 88c0116) 78k0 pin 0 port 2
47/50 measurement proceeding and calculation following lines are the implementation of the algorithm. this part ends with the reset of the i/o pin and a return of the call. the execution time is the length of the pulse triggered with the oscilloscope. figure 8 shows the diagram of the way of execution time measurement proceeding. note that it was sometimes not possible to implement all the tests on an mcu ( see 9.2.2 memory considerations ). in some of these cases, test routines have even been written and execution time has been calculated theoretically . the theoretical execution time is simply given by dividing the number of clock cycles, calculated the assembly source, by the internal processing frequency: note that experience has shown the accuracy of these theoretical calculations in front of real measures. thus results of both types can be compared. figure 8. execution time measurement proceeding number of clock cycles theoretical execution time = internal clock frequency oscilloscope screen test routine pulse execution time assignments & initializations ..... reset i/o pin ..... ..... infinite loop infinite loop test loop set i/o pin ..... ..... ..... ..... reset i/o pin
48/50 measurement proceeding and calculation 9.1.2 code size measure code size is measured with the assembly code. the result is the number of bytes used to code the test routine (in test loop part) without the set and reset instructions for the i/o pin. here is an example of a test loop : 0000 c290 test: setb p1.0 ; set i/o pin 0002 7809 mov r0, #srcpointer ; beginning of test routine 0004 7982 mov r1, #destpointer 0006 900200 mov dptr, #200h 0009 7f79 mov r7, #121 000b e6 loop: mov a, @r0 000c 93 movc a, @a+dptr 000d f7 mov @r1, a 000e 08 inc r0 000f 0a inc r2 0010 dff9 djnz r7, loop ; end of test routine 0012 d290 finish: clr p1.0 ; reset i/o pin 0014 22 ret the code size of this assembly code equals (12h-2h) = 10h = 16d, thus 16 bytes. 9.2 calculation 9.2.1 execution time and code size ratios from execution time and code size measures, preliminary ratios with st9+ mcu as reference have been calculated for each test. using those results, a global execution time ratio and a global code size ratio have been calculated as an average of all ratios. as all the tests could not have been implemented on all mcus ( see 9.2.2 memory considerations ), one or two different results are presented for each mcu. the first one, available for all the mcus, has been calculated with the reduced set of tests performed on all the mcus ( table 14 ). the second one, only available for some mcus, has been calculated with the full set of tests ( table 15 ). table 14. reduced set of tests tests concerned string, char, bubble(10 words), blkmov(64 bytes), convert, 16mul, shright, bitrst resulting ratio formulas et = execution time cs = code size sum(et ratios of reduced set) global et ratio for reduced set = number of tests of reduced set sum(cs ratios of reduced set) global cs ratio for reduced set = number of tests of reduced set
49/50 measurement proceeding and calculation table 15. full set of tests 9.2.2 memory considerations the place of the memory (internal or external) of the mcu used for stack, has indirectly a consequence on the results. as all the mcus own internal memory and do not own external memory, internal memory has been used for most of the tests. but because some tests (especially ackermann function) require an important stack capacity, alternative solutions have been elaborated. here is a synthesis of the different cases: ? for tests with a limited memory need, internal memory has been used as stack ? for tests with important memory need, - for mcus with important internal memory available, internal memory has been used - for mcus with limited internal memory but with external memory (with identical access time) available, external memory has been used - for mcus with limited internal memory and external memory with longer access time, no real measure has been made in order not to disfavour some mcus; in some of these cases, theoretical measures have been calculated based on the assembly code - note that theoretical results are closed to practical results with internal memory a small number of tests for some mcus could not have been implemented due to various reasons. tests concerned string, char, bubble(10 words), blkmov(64 bytes), convert, 16mul, shright, bitrst sieve, acker(3,5), acker(3,6), bubble(600 words), blkmov(512 bytes) resulting ratio formulas et = execution time cs = code size sum(et ratios of full set) global et ratio for full set = number of tests of full set sum(cs ratios of full set) global cs ratio for full set = number of tests of full set
50/50 measurement proceeding and calculation as theoretical results are close to actual results with internal memory ( see 9.1.1 execution time measure ), there are only two main cases (for each mcu): ? tests which have been performed (theoretically or practically with internal or external memory) ? tests which have not been implemented (due to various reasons) as a matter of facts, there are two different sets of tests: ? the reduced set of tests performed on all the mcus ? the full set of tests performed only on some mcus a rapid view on results show that the ratios obtained using both set of tests are not very different ( see 4.1 preliminary remark ). information furnished is believed to be accurate and reliable. however, sgs-thomson microelectronics assumes no responsibility for the consequences of use of such information nor for any infringement of patents or other rights of third parties which may result f rom its use. no license is granted by implication or otherwise under any patent or patent rights of sgs-thomson microelectronics. specification s mentioned in this publication are subject to change without notice. this publication supersedes and replaces all information pr eviously supplied. sgs-thomson microelectronics products are not authorized for use as critical components in life support devices or sy stems without the express written approval of sgs-thomson microelectronics. ? 1997 sgs-thomson microelectronics - all rights reserved. sgs-thomson microelectronics group of companies australia - brazil - canada - china - france - germany - hong kong - italy - japan - korea - malaysia - malta - morocco - the n etherlands singapore - spain - sweden - switzerland - taiwan - thailand - united kingdom - u.s.a.


▲Up To Search▲   

 
Price & Availability of AN910

All Rights Reserved © IC-ON-LINE 2003 - 2022  

[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy]
Mirror Sites :  [www.datasheet.hk]   [www.maxim4u.com]  [www.ic-on-line.cn] [www.ic-on-line.com] [www.ic-on-line.net] [www.alldatasheet.com.cn] [www.gdcy.com]  [www.gdcy.net]


 . . . . .
  We use cookies to deliver the best possible web experience and assist with our advertising efforts. By continuing to use this site, you consent to the use of cookies. For more information on cookies, please take a look at our Privacy Policy. X